Skip to content

Fix excessive thread name lookups#790

Open
tdryer wants to merge 2 commits into
benfred:masterfrom
tdryer:thread-name-lookups
Open

Fix excessive thread name lookups#790
tdryer wants to merge 2 commits into
benfred:masterfrom
tdryer:thread-name-lookups

Conversation

@tdryer

@tdryer tdryer commented Jul 29, 2025

Copy link
Copy Markdown

py-spy uses a cache to minimize the overhead of looking up Python thread names during sampling. If looking up a thread name fails, the entire cache is rebuilt. However, if a failure is persistent, this causes the cache to be rebuilt during every sample (potentially more than once), creating excessive overhead.

Two cases when persistent thread name lookup failures can occur are:

  • When the threading module has not been imported
  • When a thread was created outside of the threading module, such as through _thread.start_new_thread

Fix this by modifying the cache to store a negative response (None) if the thread name is still not found after rebuilding the cache. To prevent negative responses from being lost when more than one thread name lookup fails, add a flag preventing the cache from being rebuilt more than once per sample.

One disadvantage of this approach is that in nonblocking mode, a spurious negative response can be cached for an extended duration. This happens if the thread is created during a sample and after the single allowed thread name lookup happens.

tdryer and others added 2 commits July 29, 2025 11:19
py-spy uses a cache to minimize the overhead of looking up Python thread
names during sampling. If looking up a thread name fails, the entire
cache is rebuilt. However, if a failure is persistent, this causes the
cache to be rebuilt during every sample (potentially more than once),
creating excessive overhead.

Two cases when persistent thread name lookup failures can occur are:

* When the `threading` module has not been imported
* When a thread was created outside of the `threading` module, such as
  through `_thread.start_new_thread`

Fix this by modifying the cache to store a negative response (`None`) if
the thread name is still not found after rebuilding the cache. To
prevent negative responses from being lost when more than one thread
name lookup fails, add a flag preventing the cache from being rebuilt
more than once per sample.

One disadvantage of this approach is that in nonblocking mode, a
spurious negative response can be cached for an extended duration. This
happens if the thread is created during a sample and after the single
allowed thread name lookup happens.
@benfred benfred added the enhancement New feature or request label Apr 21, 2026
@cschanhniem

Copy link
Copy Markdown

I traced the code path. The issue is in python_spy.rs in _get_python_thread_name (around line 330):

fn _get_python_thread_name(&mut self, python_thread_id: u64) -> Option<String> {
    match self.python_thread_names.get(&python_thread_id) {
        Some(thread_name) => Some(thread_name.clone()),
        None => {
            self.python_thread_names = thread_name_lookup(self).unwrap_or_default();
            self.python_thread_names.get(&python_thread_id).cloned()
        }
    }
}

Two compounding problems:

1. O(N^2) per sample: _get_python_thread_name is called inside the thread loop in _get_stack_traces, once per PyThreadState. When N threads are uncached, the full thread_name_lookup (which traverses the entire threading._active dict) runs N times per sample instead of once.

2. No negative caching: When thread_name_lookup fails (threading module not imported, or threads created outside threading), unwrap_or_default() returns an empty HashMap. Every subsequent thread in the same sample — and every sample after — also misses, triggering the same failing lookup.

Fix: move the lookup out of the per-thread function and into _get_stack_traces, running it once per sample call, and add a bool flag to skip re-attempt after the first failure:

// Add to PythonSpy struct:
pub thread_name_lookup_attempted: bool,

// In _get_stack_traces, before the thread loop:
if !self.thread_name_lookup_attempted {
    self.python_thread_names = thread_name_lookup(self).unwrap_or_default();
    self.thread_name_lookup_attempted = true;
}

// Simplified _get_python_thread_name:
fn _get_python_thread_name(&mut self, python_thread_id: u64) -> Option<String> {
    self.python_thread_names.get(&python_thread_id).cloned()
}

This ensures the full dict walk happens exactly once per sample (or once ever if the threading module is absent). The stale-cache issue from recycled TIDs is already handled in _get_os_thread_id which clears python_thread_names when it detects a recycled TID, so correctness is preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants